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DETAILED ACTION 

1 . This Office Action is in response to the Arguments filed on 03/25/2009. Claims 1 , 
3-11, 13, 15-17, and 22-24 remain pending with. All mentioned claims have been 
examined. The Applicants' remarks have been carefully considered but they are not 
persuasive and do not place the case in condition for allowance. 

2. All previous objections and rejections directed to the Applicant's disclosure and 
claims not discussed in this Office Action have been withdrawn by the Examiner. 

Continued Examination Under 37 CFR 1.114 

3. A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 
03/25/2009 has been entered. 

Response to Amendments and Arguments 

4. Applicant's arguments (pages 8-1 1 ) filed on 03/25/2009 with regard to claims 1 , 
2-11, 13, 1 5-1 7, and 22-24 have been fully considered but they are not persuasive. 

As to claims 1 and 22, the Applicant argues that Richardson does not teach the 
newly added limitation of "whereby all of the identified one or more words, phrases...." 
The Examiner respectfully disagrees with this assertion upon reconsideration of 
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Richardson in view of the added limitation. It is noted that the identification of one or 
more words is taught by Crepy in col. 3, lines 36-39 and lines 40-43, where words are 
determined from the input that is loaded from memory. The secondary reference of 
Richardson teaches the grouping of all of the identified one or more words, phrases or 
utterances, where the identified words from the input text of "add" and "cant" are 
confused with "ad" and "can't". These words, which are inputted and identified, are 
categorized according to the table of confusable words as seen in Figure 4. The word 
"add", for example, is grouped with "ad", which is of the same grammar of confusability. 
The grouping and selection is evident in col. 5, lines 50-60, where an output to the user 
is made for the confused words. The fact that the grouping takes place in advance does 
not impact the claimed limitations due to their broad claim scope, where the limitations 
for categorizing and grouping do not prevent them from utilizing a pre-existing table for 
the categorizing and grouping. 

Hence, for the reasons mentioned above no new references were introduced and 
a mapping for the newly added limitation is found below. 

Claim Rejections - 35 USC § 103 

5. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically teach or described as set forth 
in section 102 of this title, if the differences between the subject matter sought to be patented and the 
prior art are such that the subject matter as a whole would have been obvious at the time the invention 
was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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6. Claims 1, 3-6, 10, 11, 15, 16, and 22 are rejected under 35 U.S.C. 103(a) as 

being unpatentable over Crepy et al. (US 6,622,1 21 ) in view of Richardson et al. (US 

5,999,896) in view of Raud et al. (US 6,1 25,341 ). 

As to claim 1 , Crepy et al. teaches a method for testing and improving the 

performance of a speech recognition engine, comprising: 

loading into a memory location one or more words, phrases or utterances 
of plural grammar types (see col. 2, lines 64-66 and col. 3, lines 36-39) (e.g. The 
words inputted from the text contains various types of words and thus are of 
plural grammar types (i.e. subject or domain).); 

identifying one or more of the words, phrases or utterances for recognition 
by a speech recognition engine (see col. 3, lines 36-39 and col. 3, lines 40-43) 
(e.g. It is seen that the reference text, which consists of words are identified and 
will be passed to the speech recognition); 

extracting the one or more words, phrases or utterances in a selected 
grammar sub-tree via a vocabulary extractor module and, passing the extracted 
one or more identified words, phrases or utterances to a text-to-speech 
conversion module that provides an audio formatted pronunciation of each word, 
phrase, or utterance (see col.3, lines 36-46 and col. 1, line 65-67) (e.g. The 
extracted words come from the reference text, which is then fed into the text to 
speech engine. An audio representation is produced as a result of the conversion 
of text into speech.); 
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passing the audio pronunciation of each of the identified one or more 
words, phrases or utterances, from the text-to- speech conversion module to the 
speech recognition engine (see col. 4, lines 59-65 and Figure 4, elements, 404 
and 406).; 

creating a recognized word, phrase or utterance for each audio 
pronunciation passed to the speech recognition engine (see col. 4, lines 59-65) 
(e.g. It is seen that the words are recognized from the audio file and then 
compared.); and 

analyzing each recognized word, phrase or utterance created by the 
speech recognition engine to determine how closely each created recognized 
word, phrase or utterance approximates the respective audio pronunciation from 
which each created recognized word, phrase or utterance is derived (see col. 4, 
lines 65-col. 5, lines 11) (e.g. It is seen that a comparison is done with regards to 
the recognized words and the actual words using the WER calculation.) 

However, Crepy et al. does not specifically teach the categorizing by the 
identified spoken words by grammar type where same utterances are grouped 
together in a grammar sub-tree and selection of a particular grammar sub-tree. 

Richardson et al. does teach use of spoken words (see col. 3, lines 39-42, 
voice recognizer allows user to input voice for conversion into text) 

categorizing the identified one or more words, phrases or utterances (see 
col. 3, lines 45-57, confusable words are identified and categorized based on a 
confusable word table) by grammar type (see Figure 4, and col. 4, lines 37-39, 
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the confusable words are separated by type of confusable word pair, 
alphabetically) whereby all of the identified one or more words, phrases or 
utterances of a same grammar type (see col. 4, lines 2 and lines 46-49, where 
the word "add", which is identified is grouped and categorized with respect to the 
confused word) are grouped together in a grammar sub-tree (see Figure 4, for 
example, the word their, the words "there" and "they're" are grouped together as 
other possible words for grammar type "their") 

selecting a particular grammar sub-tree (see col. 5, lines 47-59, user is 
presented with choices of a grammar sub-tree for grammar of confusable word 
that was identified (see Figure 7)) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. with the inclusion of categorizing words according to a 
specific grammar as taught by Richardson. The motivation to have combined the 
references involves the ability to resolve commonly confused words (See 
Richardson et al. col. 1, lines 51-53). 

However, Crepy et al. in view of Richardson et al. do not specifically teach 
the assignment of confidence score for each utterance, phrase, or word 

Raud et al. teaches assigning a confidence score to each utterance, 
phrase or word (see col. 6, lines 8-21). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
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taught by Crepy et al. in view of Richardson et al. with the inclusion of assigning 
confidence score as taught by Raud etal. The motivation to have combined the 
references involves the ability determine if the current vocabulary is appropriate 
for recognizing words and to determine of a word is properly recognized (see 
Raud etal. col. 6, lines 8-13). 



As to claim 15, Crepy et al. in view of Richardson in view of Raud teach all the 
claimed limitations as applied to claim 1 above 

Furthermore, Richardson teaches a plurality of grammar sub-trees are 
grouped together to form a grammar tree containing all of the one or more words, 
phrases, or utterances (see Figure 4) (e.g. The figure shows that a plurality of 
confusable words of different grammar types is shown with possible intended 
words or sub-trees that are linked to the candidate confusable word to form a 
confusable word table.) 



As to claim 16, Crepy et al. in view of Richardson in view of Raud teach all the 
claimed limitations as applied to claim 1 above. 

Furthermore, Crepy teaches the use of a speech recognition engine (see 
Crepy et al., Figure 4, element 406) 

Furthermore, Richardson teaches the identifying of an utterance includes 
selecting the grammar sub-tree containing the one or more words, phrases, or 
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utterances (see col. col. 4, lines 57-61, parser identifies confusable words by 
referring to a table). 



As to claim 3, Crepy et al. in view of Richardson et al. in view of Raud et al. 
teaches all the claimed limitations as applied to claims 1, above. 

Furthermore, Raud et al. teaches the assigning of confidence score to 
each recognized utterance based on a confidence level associated with the 
utterance based on prior speech recognition engine training (see Raud et al. col. 
6, line 8)(e.g. The confidence score is compared based on a threshold for 
recognition accuracy (see col. 6, lines 23-31). 



As to claims 4 and 1 0, Crepy et al. in view of Richardson et al. in view of Raud et 
al. teaches all the claimed limitations as applied to claims 1 and 3 above. 

Furthermore, Raud et al. teaches the determination being made of 
whether the recognized utterance is the same as the utterance derived by the 
speech recognition engine based on prior speech recognition training confidence 
level (see Raud et al., col. 4, lines 33-35)) (e.g. It should be noted that there is a 
vocabulary used for checking if there is a match. An initial vocabulary is used, 
then other vocabularies are used for subsequent words not found or recognized 
using the initial vocabulary (see col. 5, lines 46-56). It is inherent that the words 
from the vocabulary and the words from the utterance are matched for similarity). 
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As to claims 5 and 1 1 , Crepy et al. in view of Richardson et al. in view of Raud et 
al. teach all the claimed limitations as applied to claims 1 and 2 above. 

Furthermore, Raud et al. teaches if the confidence score exceeds an 
acceptable level designating the recognized utterance as accurately recognized 
by the speech recognition engine (see Raud et al. col. 5, lines 18-30). 

As to claim 6, Crepy et al. in view of Richardson et al. in view of Raud et al. 
teaches all the claimed limitations as applied to claims 1 , 2, and 5 above. 

Furthermore, Raud et al. teaches if the confidence score less than a 
certain value, a modification is made to the speech recognition engine to 
recognize the word (see col. 6, lines 8-31) (e.g. If the confidence level is less 
than a value, the system requests verification from a user or asks a question to 
remove any ambiguity. This is seen as a modification to the speech recognition 
engine to interpret the utterance. Further, other vocabularies are used to 
determine whether an increase in performance can be obtained.). 

As to claim 22, Crepy et al. teaches a method for testing and improving the 
performance of a speech recognition engine, comprising: 

identifying one or more of the words, phrases or utterances for recognition 
by a speech recognition engine (see col. 3, lines 36-39 and col. 3, lines 40-43) 
(e.g. It is seen that the reference text, which consists of words are identified and 
will be passed to the speech recognition); 
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creating and passing the audio pronunciation of each of the identified one 
or more words, phrases or utterances, from the text-to- speech conversion 
module to the speech recognition engine that provides an audio formatted 
pronunciation of each of the identified words, phrases, or utterances to the 
speech recognition engine (see col. 4, lines 59-65 and Figure 4, elements, 404 
and 406) (e.g. It is seen from the cited section that an audio version is created of 
the input speech and passed to the speech recognition engine.); 

deriving a recognized word, phrase or utterance for each audio 
pronunciation passed to the speech recognition engine; (see col. 4, lines 65-col. 
5, lines 1 1 ) (e.g. It is seen that a comparison is done with regards to the 
recognized words and the actual words using the WER calculation.) 

However, Crepy et al. does not specifically teach the categorizing by a 
grammar type where same utterances are grouped together in a grammar sub- 
tree. 

Richardson et al. does teach use of spoken words (see col. 3, lines 39-42, 
voice recognizer allows user to input voice for conversion into text) 

categorizing the identified one or more words, phrases or utterances (see 
col. 3, lines 45-57, confusable words are identified and categorized based on a 
confusable word table) by grammar type (see Figure 4, and col. 4, lines 37-39, 
the confusable words are separated by type of confusable word pair, 
alphabetically) whereby all of the identified one or more words, phrases or 
utterances of a same grammar type (see col. 4, lines 2 and lines 46-49, where 
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the word "add", which is identified is grouped and categorized with respect to the 
confused word) are grouped together in a grammar sub-tree (see Figure 4, for 
example, the word their, the words "there" and "they're" are grouped together as 
other possible words for grammar type "their") 

selecting a particular grammar sub-tree (see col. 5, lines 47-59, user is 
presented with choices of a grammar sub-tree for grammar of confusable word 
that was identified (see Figure 7)) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. with the inclusion of categorizing words according to a 
specific grammar as taught by Richardson. The motivation to have combined the 
references involves the ability to resolve commonly confused words (See 
Richardson et al. col. 1, lines 51-53). 

However, Crepy et al. in view of Richardson et al. do not specifically 
teach the assignment of confidence score for each utterance, phrase, or word. 

Raud et al. teaches the assigning a confidence score to each utterance, 
phrase or word (see col. 6, lines 8-21 ) based on prior training of the speech 
recognition engine to recognize similar or same words, phrases or utterances as 
t-he-each derived recognized word, phrase or utterance (see Raud et al., col. 4, 
lines 33-35) (e.g. It should be noted that there is a vocabulary used for checking 
if there is a match. An initial vocabulary is used, then other vocabularies are used 
for subsequent words not found or recognized using the initial vocabulary (see 
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Raud et al., col. 5, lines 46-56). It is inherent that the words from the vocabulary 
and the words from the utterance are matched for similarity), and. 

if the confidence score is less than an acceptable threshold, modifying the 
speech recognition engine to recognize with higher accuracy the word, phrase or 
utterance from which the derived recognized word, phrase or utterance is derived 
higher accuracy (see col. 5, lines 31-38 and col. 6, lines 22-51). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. and Richardson et al. with the inclusion of assigning 
confidence score as taught by Raud et al.. The motivation to have combined the 
references involves the ability determine if the current vocabulary is appropriate 
for recognizing words and to determine of a word is properly recognized (see col. 
6, lines 8-13). 

7. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Crepy et 
al. in view of Richardson et al. and Raud et al. as applied to claim 5 above, and further 
in view of Bickley et al. (US 7,01 3,276). 

As to claims 7, Crepy et al., Richardson et al. and Raud et al. teach improving 
the performance of a speech recognition engine. 

However, Crepy etai, Richardson etal. and Raud et al. do not 

specifically teach the notification to a developer when the score is lower than a 

threshold value. 
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Bickley et al. teaches a alert mechanism for words that are similar and are 
subject to confusion (see col. 10, lines 63-65) from threshold calculation (see col. 
10, lines 38-40). 

It would have been obvious to one of ordinary skilled in the art to modify 
the speech recognition performance methods as taught by Crepy et al., 
Richardson et al. and Raud et al. with the use of a notification sent to a software 
developer when value is below threshold as taught by Bickley et al. The 
motivation to combine these references involves the distinguishing between 
similar words, which may not be recognized by speech recognition engines (see 
Bickley et al. col. 2, line 27-36). 

8. Claim 17 is rejected under 35 U.S.C. 103(a) as being unpatentable over Crepy et 

al. in view of Richardson et al. in view of Raud as applied to claim 1 above, and further 

in view of Kennewick et al. (2004/0044516). 

As to claim 17, Crepy et al. in view of Richardson et al. in view of Raud teach all 

the claimed limitations as applied to claims 1 . 

Furthermore, Crepy et al. teaches the creating of a recognized word, 
phrase, or utterance for each respective audio pronunciation includes converting 
each respective audio pronunciation from an audio format to a digital format by 
the speech recognition engine (see Crepy etal., col. 4, lines 56-64). (e.g. It is 
seen that the audio form of the file is converted into the digital form. The words 
contain an implied pronunciation of the words.). 
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However, Crepy et al. in view of Richardson et al. in view of Raud do not 
specifically teach the analyzing phonetically each respective audio pronunciation 
of each of the one or more recognized word, phrase or utterance. 

Kennewick et al. does teach 

the analyzing phonetically each respective audio pronunciation of each of 
the one or more recognized word, phrase or utterance (see [0151]). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. and Richardson et al.. with the inclusion of analyzing the 
phonetics of each audio pronunciation. The motivation to have combined the 
references involves the add pronunciations not present in the dictionary in order 
to increase speech recognition accuracy and learning (see Kennewick etal., 
[0151]). 

9. Claims 8 and 23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Crepy et al., Richardson et al. and Raud et al. as applied to claims 6 and 22 above, and 
further in view of Kennewick et al. (US 2004/0044516). 

As to claims 8 and 23, Crepy et al., Richardson et al. and Raud et al. teach all 
the claimed limitations as applied to claims 1 , 5, and 6 above and claim 22. 
Furthermore, Raud et al. teaches the assigning of a confidence score and if less than a 
threshold, obtaining an acceptable confidence score upon next pass through the engine 
(see col. 7, lines 20-25) 



Application/Control Number: 10/647,709 Page 15 

Art Unit: 2626 

However, Crepy etai, Richardson etal. and Raud et al. do not 
specifically teach the altering of the audio pronunciation with the confidence 
score less than an acceptable threshold. 

Kennewick etal. does teach the altering of audio pronunciation of the 
word, phrase, or utterance associated with the confidence score that is less than 
an acceptable confidence score threshold level such that the altered audio 
pronunciation obtains an acceptable confidence score upon next pass through 
the speech recognition engine (see [0151]). (e.g. The speech recognition engine 
is adaptive based on the confidence levels and the pronunciation of the word 
recognized.) 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al., Richardson et al. and Raud et al. with the inclusion of 
altering the audio pronunciation of the recognized word as taught by Kennewick 
et al. The motivation to have combined the references involves the ability to 
improve the accuracy of the speech recognition engine as well as the ability for 
the speech recognition engine to learn with time (see Kennewick et al., [0151]). 

1 0. Claims 9 and 24 are rejected under 35 U.S.C. 1 03(a) as being unpatentable over 
Crepy et al. in view of Richardson et al. and in view of Raud et al. as applied to claims 
6 and 22 above, and further in view of Roberts etal. (US 6,999,930). 
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As to claims 9 and 24, Crepy et al. in view of Richardson et al. in view of Raud 
et al. teach all the claimed limitations as applied to claims 1 , 5, and 6 above and claim 
22. Furthermore, Raud et al. teaches the use of a confidence score (see col. 6, lines 23- 
31). 

Crepy et al. in view of Richardson et al. in view of Raud et al do not 
specifically teach the reduction of the confidence threshold level. 

However, Roberts does teach the reduction of the confidence score 
threshold level (see col. 10, lines 50-60). 

It would have been obvious to one of ordinary skilled in the art at the time 
the invention was made to have modified the improving of speech recognition as 
taught by Crepy et al. in view of Richardson et al. in view of Raud et al. with the 
inclusion of altering the reducing the acceptable confidence sore threshold level 
as taught by Roberts et al. The motivation to have combined the references 
involves the ability to generate more potential matches even when the 
confidence level is low (see Roberts etai, col. 10, lines 57-60). 

Conclusion 

1 1 . The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. 

Ushioda (Us 5,835,893) is cited to disclose word clustering for speech 
recognition. Epstein (US 2003/0055623) is cited to disclose determination and 
replacement of additional phrases based on an attribute. 
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12. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to PARAS SHAH whose telephone number is (571)270- 
1650. The examiner can normally be reached on MON.-THURS. 7:00a. m.-4:00p.m. 
EST. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, David Hudspeth can be reached on (571)272-7843. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/David R Hudspeth/ 

Supervisory Patent Examiner, Art Unit 2626 

/Paras Shah/ 
Examiner, Art Unit 2626 

06/04/2009 



